NEURAL NETWORK FOR UNICODE OPTICAL CHARACTER RECOGNITION

  • Type: Project
  • Department: Computer Science
  • Project ID: CPU0534
  • Access Fee: ₦5,000 ($14)
  • Chapters: 5 Chapters
  • Pages: 45 Pages
  • Format: Microsoft Word
  • Views: 1.2K
  • Report This work

For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

NEURAL NETWORK FOR UNICODE OPTICAL CHARACTER RECOGNITION  (CASE STUDY OF DHL, ENUGU)

ABSTRACT

Optical character Recognition (OCR) refers to the process of converting printed tamil text documents into software translated Unicode tamil text. The printed documents available in the form of books, projects, magazines etc are scanned using standard scanners which produce an image of the scanned documents. As part of the preprocessing phase the image like is checked for skewing. If the image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which decomposes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms. Each image glyph is comprised of 32 x 32 pixels, thus a data base of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract  the features of the glyph. The various features that are considered for classification are the character height, character width, then number of horizontal lines (Long and short), the number of vertical lines (long and short), the horizontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a support vector machine (SVM) where the characters are classified by supervised learning algorithm. These classes are mapped into Unicode for recognition. Then the text is reconstructed using Unicode fonts. 

TABLE OF CONTENTS

Title page         -       -       -       -       -       -       -      -       ii

Certification    -       -       -       -       -       -       -      -       iii

Approval page         -       -       -       -       -       -      -       iv

Dedication       -       -       -       -       -       -       -      -       v

Acknowledgement   -       -       -       -       -       -       -      vi

Abstract -       -       -       -       -       -       -       -      -       vii

Table of contents     -       -       -       -       -       -      -       ix

CHAPTER ONE

 1.0 INTRODUCTION    -      -      -      -      -      -       1

1.1      Statement of the problem       -       -       -       -       5

1.2      Purpose of the study       -       -       -       -       -      6

1.3      Aims and objectives        -       -       -       -       -      6

1.4      Scope of study         -       -       -       -       -       -      8

1.5      Limitations of the study -       -       -       -       -       8

1.6      Definition of terms.-       -       -       -       -       -       9

CHAPTER TWO

 2.0 LITERATURE REVIEW -      -      -      -      -      11

CHAPTER THREE

3.0      Methods for fact finding and details discussions on the subject matter.        -       -       -       -       -      -       15

3.1      Methodologies for fact finding         -       -       -      15

3.2      Discussions     -       -       -       -       -       -       -      16

CHAPTER FOUR

4.0      Futures, Implications and challenges of the subject matter for the society             -       -       -       -      20

4.1      Futures   -       -       -       -       -       -       -       -      20

4.2      Implications    -       -       -       -       -       -       -      21

4.3      Challenges      -       -       -       -       -       -       -      22

CHAPTER FIVE

5.0      SUMMARY, RECOMMENDATION AND CONCLUSION 24

5.1      Summary        -       -       -       -       -       -       -      24

5.2      Recommendation    -       -       -       -       -       -      25

5.3      Conclusion      -       -       -       -       -       -       -      28

References       -       -       -       -       -       -      -       30

CHAPTER ONE

1.0 INTRODUCTION

Character is the basic building block of any language that is used to build different structures of a language. Characters are the alphabets and the structures are the words, strings and sentences.

Optical character Recognition (OCR) is the process of converting an image of text, such as a scanned project character, document or electronic fax file, into computer-editable text. The text in an image is not editable. The letters/characters are made of tiny dots (pixels) that together form a picture of text. During OCR, the software analyzes an image and converts the pictures of the characters to editable text based on the patterns of the pixels in the image. After OCR, you can expert the converted text and use it with a variety of word-processing, page layout and spreadsheet applications. OCR also enables screen readers and refreshable bralle displays to read the text contained in images.

Optical character Recognition (OCR) deals with machine recognition of characters present in an input image obtained using scanning operation. It refers to the process by which scanned images are electronically processed and converted to an editable text. The need for OCR arises in the context of digitizing tamil documents from the ancient and old era to the latest, which helps in sharing the data through the internet.

A properly printed document is chosen for scanning. It is placed over the scanner, A scanner software is invoked which scans the document. The document is sent to a program that saves it in preferably TIF, JPG or GIF format, so that the image of the document can be obtained when needed. This is the first step in OCR (Vijaya Kumar, 2001), the size of the input image is as specific by the user and can be of any length but is inherently restricted by the scope of the vision and by the scanner software length.

This is the first step in the processing of scanned image. The scanned image is checked for skewing, there are possibilities of image getting skewed with either left or right orientation.

Here, the image is first brightened and binarized the function for skew detection checks for an angle of orientation between +15 degrees and if detected than a simple image rotation is carried out till the lines match with the true horizontal axis, which produce a skew corrected image.

After pre-processing, the noise free image is passed to the segmentation phase, where the image is decomposed into individual characters.

Algorithm for Segmentation:

NEURAL NETWORK FOR UNICODE OPTICAL CHARACTER RECOGNITION
For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

Share This
  • Type: Project
  • Department: Computer Science
  • Project ID: CPU0534
  • Access Fee: ₦5,000 ($14)
  • Chapters: 5 Chapters
  • Pages: 45 Pages
  • Format: Microsoft Word
  • Views: 1.2K
Payment Instruction
Bank payment for Nigerians, Make a payment of ₦ 5,000 to

Bank GTBANK
gtbank
Account Name Obiaks Business Venture
Account Number 0211074565

Bitcoin: Make a payment of 0.0005 to

Bitcoin(Btc)

btc wallet
Copy to clipboard Copy text

500
Leave a comment...

    Details

    Type Project
    Department Computer Science
    Project ID CPU0534
    Fee ₦5,000 ($14)
    Chapters 5 Chapters
    No of Pages 45 Pages
    Format Microsoft Word

    Related Works

    ABSTRACT The offline optical character recognition (OCR) system for different languages has been developed over the recent years. Since 1965, the US postal service has been using this system for automating their services. The range of the applications under this area is increasing day by day, due to its utility in almost major areas of government... Continue Reading
    ABSTRACT The offline optical character recognition (OCR) system for different languages has been developed over the recent years. Since 1965, the US postal service has been using this system for automating their services. The range of the applications under this area is increasing day by day, due to its utility in almost major areas of government... Continue Reading
    ABSTRACT The offline optical character recognition (OCR) system for different languages has been developed over the recent years. Since 1965, the US postal service has been using this system for automating their services. The range of the applications under this area is increasing day by day, due to its utility in almost major areas of government... Continue Reading
    ABSTRACT Neural Network Based Character Pattern Identification System has been one of the active and challenging research areas in the field of data computing, image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text... Continue Reading
    ABSTRACT Neural Network Based Character Pattern Identification System has been one of the active and challenging research areas in the field of data computing, image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text... Continue Reading
    Abstract This work investigates an improved protection solution based on the use of artificial neural network on the 330kV Nigerian Network modelled using Matlab R2014a. Measured fault voltages and currents signals decomposed using the discrete Fourier transform implemented via fast Fourier transform are fed as inputs to the neural network. The... Continue Reading
    ABSTRACT Tracking of vehicles, manage on-site car parking spaces, and warehouse traffic are problem that need real-time solution and can be solve by using character recognition in vehicle plate which can be obtain by using different classifier e.g. artificial neural network, fuzzy logic, Neural-fuzzy hybridization, and generic algorithm, template... Continue Reading
    Credit card fraud has been a common theft process around the globe recently. This project looks into solving and minimizing the risk of credit card fraud using AI (Artificial Intelligence) models.... Continue Reading
    Credit card fraud has been a common theft process around the globe recently. This project looks into solving and minimizing the risk of credit card fraud using AI (Artificial Intelligence) models. ... Continue Reading
    ABSTRACT The oral route is generally preferred for drug administration because of its ease and good patient compliance. In the search for new drugs for oral administration a major problem encountered is obtaining drug structures which, as well as being potent in viiro, possess favourable pharmacokinetic profiles which enable them to pass easily... Continue Reading
    Call Us
    whatsappWhatsApp Us